System and method for clustering multimedia content elements

ABSTRACT

A system and method for clustering multimedia content. The method includes: detecting at least one clustering trigger event related to at least one multimedia content element to be clustered; generating at least one signature for the at least one multimedia content element, each signature representing at least a portion of the at least one multimedia content element; determining, based on the generated at least one signature, at least one multimedia content element cluster, wherein each multimedia content element cluster includes a plurality of clustered multimedia content elements sharing at least one common concept with the at least one multimedia content element; and adding, to each determined cluster, the at least one multimedia content element.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a continuation of U.S. patent Ser. No. 15/420,989 filing date Jan. 31, 2017 which claims the benefit of U.S. Provisional Application No. 62/307,515 filed on Mar. 13, 2016.

The contents of the above-referenced applications are hereby incorporated by reference.

TECHNICAL FIELD

The present disclosure relates generally to organizing multimedia content, and more specifically to clustering based on analysis of multimedia content elements.

BACKGROUND

As the Internet continues to grow exponentially in size and content, the task of finding relevant and appropriate information has become increasingly complex. Organized information can be browsed or searched more quickly than unorganized information. As a result, effective organization of content allowing for subsequent retrieval is becoming increasingly important.

Search engines are often used to search for information, either locally or over the World Wide Web. Many search engines receive queries from users and uses such queries to find and return relevant content. The search queries may be in the form of, for example, textual queries, images, audio queries, etc.

Search engines often face challenges when searching for multimedia content (e.g., images, audio, videos, etc.). In particular, existing solutions for searching for multimedia content are typically based on metadata of multimedia content elements. Such metadata may be associated with a multimedia content element and may include parameters such as, for example, size, type, name, short description, tags describing articles or subject matter of the multimedia content element, and the like. A tag is a non-hierarchical keyword or term assigned to data (e.g., multimedia content elements). The name, tags, and short description are typically manually provided by, e.g., the creator of the multimedia content element (for example, a user who captured the image using his smart phone), a person storing the multimedia content element in a storage, and the like.

Tagging has gained widespread popularity in part due to the growth of social networking, photograph sharing, and bookmarking of websites. Some websites allow users to create and manage tags that categorize content using simple keywords. The users of such sites manually add and define descriptions used for tags. Some of these websites only allow tagging of specific portions of multimedia content elements (e.g., portions of images showing people). Thus, the tags assigned to a multimedia content may not fully capture the contents shown therein.

Further, because at least some of the metadata of a multimedia content element is typically provided manually by a user, such metadata may not accurately describe the multimedia content element or facets thereof. As examples, the metadata may be misspelled, provided with respect to a different image than intended, vague or otherwise failing to identify one or more aspects of the multimedia content, and the like. As an example, a user may provide a file name “weekend fun” for an image of a cat, which does not accurately indicate the contents (e.g., the cat) shown in the image. Thus, a query for the term “cat” would not return the “weekend fun” image.

Additionally, different users may utilize different tags to refer to the same subject or topic, thereby resulting in some multimedia content elements related to a particular subject having one tag and other multimedia content elements related to the subject having a different tag. For example, one user may tag images of trees with the term “plants,” while another user tags images of trees with the term “trees.” Thus, a query based on either the tag “plants” or the tag “trees” will only return results including one of the images despite both images being relevant to the query.

It would therefore be advantageous to provide a solution that would overcome the deficiencies of the prior art.

SUMMARY

A summary of several example embodiments of the disclosure follows. This summary is provided for the convenience of the reader to provide a basic understanding of such embodiments and does not wholly define the breadth of the disclosure. This summary is not an extensive overview of all contemplated embodiments, and is intended to neither identify key or critical elements of all embodiments nor to delineate the scope of any or all aspects. Its sole purpose is to present some concepts of one or more embodiments in a simplified form as a prelude to the more detailed description that is presented later. For convenience, the term “some embodiments” may be used herein to refer to a single embodiment or multiple embodiments of the disclosure.

Some embodiments disclosed herein include a method for clustering multimedia content. The method comprises detecting at least one clustering trigger event related to at least one multimedia content element to be clustered; generating at least one signature for the at least one multimedia content element, each signature representing at least a portion of the at least one multimedia content element; determining, based on the generated at least one signature, at least one multimedia content element cluster, wherein each multimedia content element cluster includes a plurality of clustered multimedia content elements sharing at least one common concept with the at least one multimedia content element; and adding, to each determined cluster, the at least one multimedia content element.

Some embodiments disclosed herein also include a non-transitory computer readable medium having stored thereon instructions for causing a processing circuitry to execute a process, the process comprising: detecting at least one clustering trigger event related to at least one multimedia content element to be clustered; generating at least one signature for the at least one multimedia content element, each signature representing at least a portion of the at least one multimedia content element; determining, based on the generated at least one signature, at least one multimedia content element cluster, wherein each multimedia content element cluster includes a plurality of clustered multimedia content elements sharing at least one common concept with the at least one multimedia content element; and adding, to each determined cluster, the at least one multimedia content element.

Some embodiments disclosed herein also include a system for clustering multimedia content. The system comprises a processing circuitry; and a memory, the memory containing instructions that, when executed by the processing circuitry, configure the system to: detect at least one clustering trigger event related to at least one multimedia content element to be clustered; generate at least one signature for the at least one multimedia content element, each signature representing at least a portion of the at least one multimedia content element; determine, based on the generated at least one signature, at least one multimedia content element cluster, wherein each multimedia content element cluster includes a plurality of clustered multimedia content elements sharing at least one common concept with the at least one multimedia content element; and add, to each determined cluster, the at least one multimedia content element.

BRIEF DESCRIPTION OF THE DRAWINGS

The subject matter disclosed herein is particularly pointed out and distinctly claimed in the claims at the conclusion of the specification. The foregoing and other objects, features, and advantages of the disclosed embodiments will be apparent from the following detailed description taken in conjunction with the accompanying drawings.

FIG. 1 is a network diagram utilized to describe the various disclosed embodiments.

FIG. 2 is a flowchart illustrating a method for clustering multimedia content elements according to an embodiment.

FIG. 3 is a block diagram depicting the basic flow of information in the signature generator system.

FIG. 4 is a diagram showing the flow of patches generation, response vector generation, and signature generation in a large-scale speech-to-text system.

FIG. 5 is a block diagram illustrating a clustering system according to an embodiment.

DETAILED DESCRIPTION

It is important to note that the embodiments disclosed herein are only examples of the many advantageous uses of the innovative teachings herein. In general, statements made in the specification of the present application do not necessarily limit any of the various claimed embodiments. Moreover, some statements may apply to some inventive features but not to others. In general, unless otherwise indicated, singular elements may be in plural and vice versa with no loss of generality. In the drawings, like numerals refer to like parts through several views.

The various disclosed embodiments include a method and system for clustering multimedia content elements (MMCEs). The clustering allows for organizing and searching of multimedia content elements based on common concepts. In an example embodiment, multimedia content elements to be clustered are obtained. For each multimedia content element, at least one signature is generated. Based on the signatures generated for each multimedia content element a search tag may be generated. In an embodiment, a plurality of search tags can be generated for each multimedia content element. Each of the multimedia content elements is added to a multimedia content element cluster based on the generated at least one signature, the generated tags, or both. Each multimedia content element cluster includes a plurality of multimedia content elements having at least one concept in common.

In an example embodiment, the common concept among multimedia content elements of a multimedia content element cluster may be a collection of signatures representing elements of the unstructured data and metadata describing the concept. The common concept may represent an item or aspect of the multimedia content elements such as, but not limited to, an object, a person, an animal, a pattern, a color, a background, a character, a sub textual aspect (e.g., an aspect indicating sub textual information such as activities or actions being performed, relationships among individuals shown such as teams or members of an organization, etc.), a meta aspect indicating information about the multimedia content element itself (e.g., an aspect indicating that an image is a “selfie” taken by a person in the image), words, sounds, voices, motions, combinations thereof, and the like. Multimedia content elements may share a common concept when each of the multimedia content elements is associated with at least one signature, at least one portion of a signature, at least one tag, or a combination thereof, that is common to all of the multimedia content elements sharing a common concept.

In an embodiment, the at least one multimedia content element may be clustered based further on metadata associated with a user. The user may be, but is not limited to, a user of a user device in which the at least one multimedia content element is stored. In another embodiment, the clustering may include searching, based on the generated at least one signature, for clusters including multimedia content elements sharing a common concept. The searching may further include comparing the generated at least one signature to signatures of a plurality of multimedia content element clusters to determine matching signatures, where the at least one multimedia content element may be added to a cluster associated with matching signatures.

FIG. 1 shows an example network diagram 100 utilized to describe the various embodiments disclosed herein. The example network diagram includes user device 110, a clustering system 130, a database 150, and a deep content classification (DCC) system 160, communicatively connected via a network 120.

The network 120 is used to communicate between different components of the network diagram 100. The network 120 may be the Internet, the world-wide-web (WWW), a local area network (LAN), a wide area network (WAN), a metro area network (MAN), and other networks capable of enabling communication between the components of the network diagram 100.

The user device 110 may be, but is not limited to, a personal computer (PC), a personal digital assistant (PDA), a mobile phone, a smart phone, a tablet computer, a wearable computing device, a smart television, and other devices configured for storing, viewing, and sending multimedia content elements.

The user device 110 may have installed thereon an application (app) 115. The application 115 may be downloaded from applications repositories such as, but not limited to, the AppStore®, Google Play®, or any other repositories storing applications. The application 115 may be pre-installed in the user device 110. The application 115 may be, but is not limited to, a mobile application, a virtual application, a web application, a native application, and the like. In an example implementation, the application 115 may be a web browser.

In an embodiment, the clustering system 130 is configured to cluster multimedia content elements. The clustering system 130 typically includes, but is not limited to, a processing circuitry connected to a memory, the memory containing instructions that, when executed by the processing circuitry, configure the clustering system 130 to at least perform clustering of multimedia content elements as described herein. In an embodiment, the processing circuitry may be realized as an array of at least partially statistically independent computational core, the properties of each core being set independently of the properties of each other core. An example block diagram of the clustering system 130 is described further herein below with respect to FIG. 5.

In an embodiment, the clustering system 130 is configured to initiate clustering of the multimedia content elements upon detection of at least one clustering trigger event. The at least one clustering trigger event may include, but is not limited to, receipt of a request to cluster a multimedia content element or a plurality of multimedia content elements.

To this end, in an embodiment, the clustering system 130 is configured to receive, from the user device 110, a request to cluster a multimedia content element or a plurality of multimedia content elements. Clustering each of the multimedia content elements may include generating a cluster based on two or more multimedia content elements, or adding a multimedia content element to an existing cluster. The request may include, but is not limited to, the multimedia content element or plurality of multimedia content elements, an identifier of one or more of the multimedia content elements, an indicator of a location of one or more of the multimedia content elements (e.g., an indicator of a location in the database 150 in which one or more of the multimedia content elements is stored), a combination thereof, and the like. As non-limiting examples, the request may include an image, an identifier used for finding the image, a location of the image in a storage (e.g., one of the data sources 160), or a combination thereof.

Each multimedia content element may include, but is not limited to, images, graphics, video streams, video clips, audio streams, audio clips, video frames, photographs, images of signals (e.g., spectrograms, phasograms, scalograms, etc.), combinations thereof, portions thereof, and the like. The multimedia content elements may be, e.g., captured via the user device 110.

In an optional embodiment, the clustering system 130 is further communicatively connected to a signature generator system (SGS) 140. In a further embodiment, the clustering system 130 may be configured to send, to the signature generator system 140, the multimedia content element to be clustered. The signature generator system 140 is configured to generate signatures based on the multimedia content element and to send the generated signatures to the clustering system 130. In another embodiment, the clustering system 130 may be configured to generate the signatures. Generation of signatures based on multimedia content elements is described further herein below with respect to FIGS. 3 and 4. In another embodiment, the signatures generated for more than one multimedia content element may be clustered.

In an optional embodiment, the clustering system 130 is further communicatively connected to a deep-content classification (DCC) system 160. The DCC system 160 may be configured to continuously create a knowledge database for multimedia data. To this end, the DCC system 160 may be configured to initially receive a large number of multimedia content elements to create a knowledge database that is condensed into concept structures that are efficient to store, retrieve, and check for matches. As new multimedia content elements are collected by the DCC system 160, they are efficiently added to the knowledge base and concept structures such that the resource requirement is generally sub-linear rather than linear or exponential. The DCC system 160 is configured to extract patterns from each multimedia content element and selects the important/salient patterns for the creation of signatures thereof. A process of inter-matching between the patterns followed by clustering, is followed by reduction of the number of signatures in a cluster to a minimum that maintains matching and enables generalization to new multimedia content elements. Metadata respective of the multimedia content elements is collected, thereby forming, together with the reduced clusters, a concept structure.

In a further embodiment, the clustering system 130 may be configured to obtain, from the DCC system 160, at least one concept structure matching each of the multimedia content elements to be clustered. In yet a further embodiment, the clustering system 130 may be configured to query the DCC system 160 for the at least one matching concept structure. The query may be made with respect to the signatures for the multimedia content elements to be clustered. In an embodiment, multimedia content elements associated with the obtained matching concept structures may be utilized as for determining clusters to which the multimedia content elements to be clustered are added.

In an optional embodiment, the clustering system 130 is configured to generate, based on the signatures for the multimedia content elements to be clustered, at least one tag for each multimedia content element. Each tag is a textual index term assigned to content. The generated tags are searchable (e.g., by the user device 110 or other user devices), and may be included in metadata for the multimedia content element. In an embodiment, the tags may be generated based on metadata of the obtained at least one concept structure. As a non-limiting example, if metadata of an obtained concept structure includes the word “Superman®”, the generated tags may include the textual term “Superman®”.

In an embodiment, based on the generated signatures, the generated tags, or both, the clustering system 130 is configured to determine at least one multimedia content element cluster for each multimedia content element to be clustered. Each determined multimedia content element cluster includes a plurality of multimedia content elements sharing at least one common concept with each other and with the multimedia content element or plurality of multimedia content elements to be clustered. The common concept of a multimedia content element may be a collection of signatures representing elements of the unstructured data and metadata describing the concept. The common concept may represent an item or aspect of the multimedia content element such as, but not limited to, an object, a person, an animal, a pattern, a color, a background, a character, a sub textual aspect, a meta aspect, words, sounds, voices, motions, combinations thereof, and the like. In a further embodiment, multimedia content elements may share a common concept when each of the multimedia content elements is associated with at least one signature, at least one portion of a signature, at least one tag, or a combination thereof, that is common to the multimedia content elements sharing a common concept.

It should be noted that multiple multimedia content element clusters may be determined for each multimedia content element. As a non-limiting example, for an image showing a “selfie” of a person (i.e., an image showing the person that is captured by the person) taken on the beach, multimedia content element clusters including multimedia content elements showing the person, selfies of the person or of other people, and beach scenery may be determined, and the selfie image may be clustered into each of the determined multimedia content element clusters.

In a further embodiment, determining the multimedia content element clusters may include comparing the generated signatures or the generated tags to signatures or tags, respectively, of a plurality of multimedia content element clusters. Each determined multimedia content element cluster may be, e.g., a cluster having signatures or tags that match the generated signatures or tags above a predetermined threshold. As a non-limiting example, a signature is generated based on a video showing a stand-up comedy performance by the comedian Jerry Seinfeld, and tags including “Jerry Seinfeld” and “stand-up comedy” are generated based on the generated signature. In yet a further embodiment, the determined multimedia content element clusters may include one cluster for each tag.

In yet a further embodiment, one or more of the multimedia content element clusters may be included in or associated with a concept structure such that the comparison may include comparing the generated signatures or the generated tags to a reduced set of signatures or tags of the concept structure, respectively. In a further embodiment, the multimedia content elements to be clustered may be added to the concept structures having matching multimedia content element clusters.

In another embodiment, if no existing multimedia content element clusters having concepts in common with the multimedia content element can be found (e.g., if no signatures or tags match the generated signatures or tags above a predetermined threshold), the clustering system 130 may be configured to generate a multimedia content element cluster including the multimedia content elements to be clustered. Generating the multimedia content element cluster may include, but is not limited to, searching in one or more data sources (e.g., the user device 110, the database 150, or other data sources not shown that may be accessible over, e.g., the Internet) to identify multimedia content elements sharing common concepts with the multimedia content element. The searching may be based on the generated signatures, the generated tags, or both. The identified multimedia content elements are clustered with the multimedia content element to be clustered, and the resulting cluster may be stored in, e.g., the database 150. In a further embodiment, the generated cluster may further include the generated tags.

It should be noted that using signatures for tagging multimedia content elements, clustering multimedia content elements, or both, ensures more accurate clustering of multimedia content than, for example, when using manually provided metadata (e.g., tags provided by users). For instance, in order to cluster an image of a sports car into an appropriate cluster, it may be desirable to locate a car of a particular model. However, in most cases the model of the car would not be part of the metadata associated with the multimedia content (image). Moreover, the car shown in an image may be at angles different from the angles of a specific photograph of the car that is available as a search item. The signature generated for that image would enable accurate recognition of the model of the car because the signatures generated for the multimedia content elements, according to the disclosed embodiments, allow for recognition and classification of multimedia content elements, such as, content-tracking, video filtering, multimedia taxonomy generation, video fingerprinting, speech-to-text, audio classification, element recognition, video/image search and any other application requiring content-based signatures generation and matching for large content volumes such as, web and other large-scale databases.

The database 150 stores multimedia content elements, clusters of multimedia content elements, or both. In the example network diagram 100 shown in FIG. 1, the clustering system 130 communicates with the database 150 through the network 120. In other non-limiting configurations, the clustering system 130 may be directly connected to the database 150. The database 150 may be accessible to, e.g., the user device 110, other user devices (not shown), or both, thereby allowing for retrieval of clusters from the database 150 by such user devices.

It should also be noted that the signature generator system 140 and the DCC system 160 are shown in FIG. 1 as being directly connected to the clustering system 130 merely for simplicity purposes and without limitation on the disclosed embodiments. The signature generator system 140, the DCC system 160, or both, may be included in the clustering system 130 or communicatively connected to the clustering system 130 over, e.g., the network 120, without departing from the scope of the disclosure.

It should be further noted that the clustering is described as being performed by the clustering system 130 merely for simplicity purposes and without limitation on the disclosed embodiments. The clustering may be equally performed locally by, e.g., the user device 110, without departing from the scope of the disclosure. In such a case, the user device 110 may include the clustering system 130, the signature generator system 140, the DCC system 160, or any combination thereof, or may otherwise be configured to perform any or all of the processes performed by such systems. Further, local clustering by the user device 110 may be based on multimedia content clusters stored locally on the user device 110.

As a non-limiting example for local clustering by the user device 110, the clustering may be based on clusters of images in a photo library stored on the user device 110 such that new images may be clustered in real-time and, therefore, subsequently searched by a user of the user device 110. Thus, when, for example, the user of the user device 110 captures an image of his dog named “Lucky,” the user device 110 may cluster the image with other images of the dog Lucky stored in the user device 110 such that, when the user searches through the user device 110 for images using the query “lucky,” the captured image is returned along with other clustered images of the dog Lucky.

FIG. 2 is an example flowchart 200 illustrating a method for clustering of multimedia content elements according to an embodiment. In another embodiment, the method may be performed in response to a request to cluster one or more multimedia content elements.

At S205, a clustering trigger event is detected. The clustering trigger event may be or may include, but is not limited to, receiving a request to cluster at least one multimedia content element.

At S210, at least one multimedia content element to be clustered is obtained. In an embodiment, the at least one multimedia content element may be obtained based on a request to cluster the at least one multimedia content element. The request may include the at least one multimedia content element to be clustered, an identifier of one or more of the at least one multimedia content element, an indicator of a location of one or more of the at least one multimedia content element, or a combination thereof.

At S220, at least one signature is generated for each multimedia content element. Each generated signature may be robust to noise and distortion. In an embodiment, the signatures are generated by a signature generator system as described further herein below with respect to FIGS. 3 and 4. In another embodiment, S220 may include sending, to a signature generator system (e.g., the signature generator system 140, FIG. 1), the multimedia content element and receiving, from the signature generator system, the at least one signature generated for each multimedia content element.

At optional S230, at least one tag is generated for the at least one multimedia content element based on the generated at least one signature. Each tag is a textual index term assigned to the multimedia content element as described further herein above. As non-limiting examples of tags, the tag “me” may be assigned to an image of the user's face, the tag “my dog” may be assigned to an image of a dog, and the tag “my dog and I” may be assigned to an image featuring both the user and a dog.

In an embodiment, S230 may include comparing the generated at least one signature to signatures of a plurality of multimedia content elements having assigned predetermined tags. In a further embodiment, tags of multimedia content elements having signatures that match one or more of the generated at least one signature may be generated as tags for the multimedia content element.

In another embodiment, the at least one tag may be generated based on metadata of concept structures matching the at least one multimedia content element to be clustered. To this end, in a further embodiment, S230 may further include obtaining, from a DCC system (e.g., the DCC system 160, FIG. 1), at least one concept structure matching each multimedia content element to be clustered. In yet a further embodiment, S230 may further include querying the DCC system with respect to the signatures for each multimedia content element to be clustered.

At S240, at least one multimedia content element cluster is determined. Each determined multimedia content element cluster includes a plurality of multimedia content elements sharing a common concept. Each of the at least one multimedia content element also shares the common concept of the multimedia content element cluster. The common concept of a multimedia content element may be may be a collection of signatures representing elements of the unstructured data and metadata describing the concept. The common concept may represent an item or aspect in the multimedia content element such as, but not limited to, an object, a person, an animal, a pattern, a color, a background, a character, a sub textual aspect, a meta aspect, words, sounds, voices, motions, combinations thereof, and the like. Multimedia content elements may share a common concept when each of the multimedia content elements is associated with at least one signature, at least one portion of a signature, at least one tag, or a combination thereof, that is common to all of the multimedia content elements sharing a common concept.

As non-limiting examples, the common concept may represent, e.g., a Labrador retriever dog shown in images or videos, a voice of the actor Daniel Radcliffe that can be heard in audio or videos, a motion including swinging of a baseball bat shown in videos, a subtext of playing chess, an indication that an image is a “selfie,” and the like.

The common concept may be further based on levels of granularity. For example, the common concept may be related to cats generally such that any cats shown or heard in multimedia content elements is considered a common concept, or may be related to a particular cat such that only visual or audio representations of that cat are considered to be a common concept. Such granularity may depend on, e.g., a threshold for matching signatures, tags, or both, such that higher thresholds result in more granular results.

In another embodiment, the determined at least one multimedia content element may include only multimedia content elements of the same type as the obtained multimedia content element. For example, if the obtained multimedia content element is an image, only other images having a common concept may be determined. In yet another embodiment, multimedia content elements of different types may be determined. Which types of multimedia content elements may be determined may be based on, e.g., one or more rules.

As a non-limiting example of a common concept, for an image showing a person wearing a parachute with the sky in the background, a tag for the image may be “skydiving.” The common concept may be the sub textual aspect “skydiving” indicating an activity that the person shown in the image is performing. Other multimedia content elements showing or otherwise illustrating people skydiving may also be associated with the tag “skydiving” and, therefore, the sub textual aspect “skydiving” is a common concept of the multimedia content elements.

As another non-limiting example of a common concept, for an audio clip in which a user recites information that the user wishes to reference later, a portion of a signature generated for the audio clip may be related to the meta aspect “note to self.” In particular, a portion of the signature may be generated based on the words “note to self” spoken at the beginning of the audio clip. Other multimedia content elements may also have portions of signatures related to the concept “note to self” (e.g., other content illustrating the words “note to self” or similar phrases) and, therefore, the meta aspect “note to self” is a common concept of the multimedia content elements. In a further example, only multimedia content elements related to the particular user heard in the obtained multimedia content element (i.e., multimedia content elements featuring a voice of the user who recorded the obtained multimedia content element) may be determined as having a concept in common with the obtained multimedia content element such that the cluster includes only notes to self by the same user.

In an embodiment, if no existing multimedia content element clusters having a common concept with the multimedia content element can be found (e.g., if no multimedia content element clusters are associated with signatures or tags matching the generated at least one signature or the generated at least one tag above a predetermined threshold), S240 may include generating a new multimedia content element cluster. In a further embodiment, generating the new multimedia content element cluster may include searching in one or more data sources to identify multimedia content elements sharing a common concept with the obtained multimedia content element. The identified multimedia content elements may be clustered with the obtained multimedia content element.

At S250, the at least one multimedia content element is added to the determined or generated new multimedia content element cluster. In an embodiment, S250 may further include storing the at least one multimedia content element cluster with the added at least one multimedia content element in a storage (e.g., the database 150 of FIG. 1, a data source such as a web server, etc.). As a non-limiting example, the cluster may be stored in a server of a social media platform, thereby enabling other users to find the cluster during searches. Each cluster may be stored separately such that different groupings of multimedia content elements are stored in separate locations. For example, different clusters of multimedia content elements may be stored in different folders.

At S260, it is determined if additional multimedia content elements are to be clustered and, if so, execution continues with S205; otherwise, execution terminates.

Clustering of the multimedia content elements allows for organizing the multimedia content elements based on subject matter represented by various concepts. Such organization may be useful for, e.g., organizing photos captured by a user of a smart phone based on common subject matter. As a non-limiting example, images showing dogs, a football game, and food may be organized into different collections and, for example, stored in separate folders on the smart phone. Such organization may be particularly useful for social media or other content sharing applications, as multimedia content being shared can be organized and shared with respect to content. Additionally, such organization may be useful for subsequent retrieval, particularly when the organization is based on tags. As noted above, using signatures to classify the multimedia content elements typically results in more accurate identification of multimedia content elements sharing similar content.

It should be noted that the embodiments described herein above with respect to FIG. 2 are discussed as including clustering multimedia content elements in series merely for simplicity purposes and without limitations on the disclosure. Multiple multimedia content elements may be clustered in parallel without departing from the scope of the disclosure. Further, the clustering method discussed above can be performed by the clustering system 130, or locally by a user device (e.g., the user device 110, FIG. 1).

FIGS. 3 and 4 illustrate the generation of signatures for the multimedia content elements by the signature generator system 140 according to an embodiment. An example high-level description of the process for large scale matching is depicted in FIG. 3. In this example, the matching is for a video content.

Video content segments 2 from a Master database (DB) 6 and a Target DB 1 are processed in parallel by a large number of independent computational Cores 3 that constitute an architecture for generating the Signatures (hereinafter the “Architecture”). Further details on the computational Cores generation are provided below. The independent Cores 3 generate a database of Robust Signatures and Signatures 4 for Target content-segments 5 and a database of Robust Signatures and Signatures 7 for Master content-segments 8. An exemplary and non-limiting process of signature generation for an audio component is shown in detail in FIG. 4. Finally, Target Robust Signatures and/or Signatures are effectively matched, by a matching algorithm 9, to Master Robust Signatures and/or Signatures database to find all matches between the two databases.

To demonstrate an example of the signature generation process, it is assumed, merely for the sake of simplicity and without limitation on the generality of the disclosed embodiments, that the signatures are based on a single frame, leading to certain simplification of the computational cores generation. The Matching System is extensible for signatures generation capturing the dynamics in-between the frames.

The Signatures' generation process is now described with reference to FIG. 4. The first step in the process of signatures generation from a given speech-segment is to breakdown the speech-segment to K patches 14 of random length P and random position within the speech segment 12. The breakdown is performed by the patch generator component 21. The value of the number of patches K, random length P and random position parameters is determined based on optimization, considering the tradeoff between accuracy rate and the number of fast matches required in the flow process of the context server 130 and SGS 140. Thereafter, all the K patches are injected in parallel into all computational Cores 3 to generate K response vectors 22, which are fed into a signature generator system 23 to produce a database of Robust Signatures and Signatures 4.

In order to generate Robust Signatures, i.e., Signatures that are robust to additive noise L (where L is an integer equal to or greater than 1) by the Computational Cores 3 a frame is injected into all the Cores 3. Then, Cores 3 generate two binary response vectors: {right arrow over (S)} which is a Signature vector, and {right arrow over (RS)} which is a Robust Signature vector.

For generation of signatures robust to additive noise, such as White-Gaussian-Noise, scratch, etc., but not robust to distortions, such as crop, shift and rotation, etc., a core Ci={ni}(1.ltoreq.i.ltoreq.L) may consist of a single leaky integrate-to-threshold unit (LTU) node or more nodes. The node ni equations are:

V i=j w ij k j## EQU00001## n i=.theta.(Vi−Thx)## EQU0001.2##

where, .theta. is a Heaviside step function; w.sub.ij is a coupling node unit (CNU) between node i and image component j (for example, grayscale value of a certain pixel j); kj is an image component T (for example, grayscale value of a certain pixel j); Thx is a constant Threshold value, where ‘x’ is ‘S’ for Signature and ‘RS’ for Robust Signature; and Vi is a Coupling Node Value.

The Threshold values Thx are set differently for Signature generation and for Robust Signature generation. For example, for a certain distribution of Vi values (for the set of nodes), the thresholds for Signature (Ths) and Robust Signature (ThRs) are set apart, after optimization, according to at least one or more of the following criteria:

1: For:

V.sub.i>Th.sub.RS

1−p(V>Th.sub.S)−1−(1-.epsilon.).sup.1<<1

i.e., given that/nodes (cores) constitute a Robust Signature of a certain image I, the probability that not all of these I nodes will belong to the Signature of same, but noisy image, is sufficiently low (according to a system's specified accuracy).

2:

p(V.sub.i>Th.sub.RS).apprxeq.1/L

i.e., approximately/out of the total L nodes can be found to generate a Robust Signature according to the above definition.

3: Both Robust Signature and Signature are generated for certain frame i.

It should be understood that the generation of a signature is unidirectional, and typically yields lossless compression, where the characteristics of the compressed data are maintained but the uncompressed data cannot be reconstructed. Therefore, a signature can be used for the purpose of comparison to another signature without the need of comparison to the original data. The detailed description of the Signature generation can be found in U.S. Pat. Nos. 8,326,775 and 8,312,031, assigned to the common assignee, which are hereby incorporated by reference for all the useful information they contain.

A Computational Core generation is a process of definition, selection, and tuning of the parameters of the cores for a certain realization in a specific system and application. The process is based on several design considerations, such as:

(a) The Cores should be designed so as to obtain maximal independence, i.e., the projection from a signal space should generate a maximal pair-wise distance between any two cores' projections into a high-dimensional space.

(b) The Cores should be optimally designed for the type of signals, i.e., the Cores should be maximally sensitive to the spatio-temporal structure of the injected signal, for example, and in particular, sensitive to local correlations in time and space. Thus, in some cases a core represents a dynamic system, such as in state space, phase space, edge of chaos, etc., which is uniquely used herein to exploit their maximal computational power.

(c) The Cores should be optimally designed with regard to invariance to a set of signal distortions, of interest in relevant applications.

A detailed description of the Computational Core generation and the process for configuring such cores is discussed in more detail in the above-referenced U.S. Pat. No. 8,655,801.

FIG. 5 is an example block diagram illustrating a clustering system 130 implemented according to an embodiment. The clustering system 130 includes a processing circuitry 510 coupled to a memory 520, a storage 530, and a network interface 540. In an embodiment, the components of the clustering system 130 may be communicatively connected via a bus 550.

The processing circuitry 510 may be realized as one or more hardware logic components and circuits. For example, and without limitation, illustrative types of hardware logic components that can be used include field programmable gate arrays (FPGAs), application-specific integrated circuits (ASICs), Application-specific standard products (ASSPs), system-on-a-chip systems (SOCs), general-purpose microprocessors, microcontrollers, digital signal processors (DSPs), and the like, or any other hardware logic components that can perform calculations or other manipulations of information. In an embodiment, the processing circuitry 510 may be realized as an array of at least partially statistically independent computational cores. The properties of each computational core are set independently of those of each other core, as described further herein above.

The memory 520 may be volatile (e.g., RAM, etc.), non-volatile (e.g., ROM, flash memory, etc.), or a combination thereof. In one configuration, computer readable instructions to implement one or more embodiments disclosed herein may be stored in the storage 530.

In another embodiment, the memory 520 is configured to store software. Software shall be construed broadly to mean any type of instructions, whether referred to as software, firmware, middleware, microcode, hardware description language, or otherwise. Instructions may include code (e.g., in source code format, binary code format, executable code format, or any other suitable format of code). The instructions, when executed by the processing circuitry 510, cause the processing circuitry 510 to perform the various processes described herein. Specifically, the instructions, when executed, cause the processing circuitry 510 to perform clustering of multimedia content elements as described herein.

The storage 530 may be magnetic storage, optical storage, and the like, and may be realized, for example, as flash memory or other memory technology, CD-ROM, Digital Versatile Disks (DVDs), or any other medium which can be used to store the desired information.

The network interface 540 allows the clustering system 130 to communicate with the signature generator system 140 for the purpose of, for example, sending multimedia content elements, receiving signatures, and the like. Additionally, the network interface 540 allows the clustering system 130 to communicate with the user device 110 in order to obtain multimedia content elements to be clustered.

It should be understood that the embodiments described herein are not limited to the specific architecture illustrated in FIG. 5, and other architectures may be equally used without departing from the scope of the disclosed embodiments. In particular, the clustering system 130 may further include a signature generator system configured to generate signatures as described herein without departing from the scope of the disclosed embodiments.

It should be understood that any reference to an element herein using a designation such as “first,” “second,” and so forth does not generally limit the quantity or order of those elements. Rather, these designations are generally used herein as a convenient method of distinguishing between two or more elements or instances of an element. Thus, a reference to first and second elements does not mean that only two elements may be employed there or that the first element must precede the second element in some manner. Also, unless stated otherwise, a set of elements comprises one or more elements.

As used herein, the phrase “at least one of” followed by a listing of items means that any of the listed items can be utilized individually, or any combination of two or more of the listed items can be utilized. For example, if a step in a method is described as including “at least one of A, B, and C,” the step can include A alone; B alone; C alone; A and B in combination; B and C in combination; A and C in combination; or A, B, and C in combination.

The various embodiments disclosed herein can be implemented as hardware, firmware, software, or any combination thereof. Moreover, the software is preferably implemented as an application program tangibly embodied on a program storage unit or computer readable medium consisting of parts, or of certain devices and/or a combination of devices. The application program may be uploaded to, and executed by, a machine comprising any suitable architecture. Preferably, the machine is implemented on a computer platform having hardware such as one or more central processing units (“CPUs”), a memory, and input/output interfaces. The computer platform may also include an operating system and microinstruction code. The various processes and functions described herein may be either part of the microinstruction code or part of the application program, or any combination thereof, which may be executed by a CPU, whether or not such a computer or processor is explicitly shown. In addition, various other peripheral units may be connected to the computer platform such as an additional data storage unit and a printing unit. Furthermore, a non-transitory computer readable medium is any computer readable medium except for a transitory propagating signal.

All examples and conditional language recited herein are intended for pedagogical purposes to aid the reader in understanding the principles of the disclosed embodiment and the concepts contributed by the inventor to furthering the art, and are to be construed as being without limitation to such specifically recited examples and conditions. Moreover, all statements herein reciting principles, aspects, and embodiments of the disclosed embodiments, as well as specific examples thereof, are intended to encompass both structural and functional equivalents thereof. Additionally, it is intended that such equivalents include both currently known equivalents as well as equivalents developed in the future, i.e., any elements developed that perform the same function, regardless of structure. 

We claim:
 1. A method for clustering multimedia content, comprising: detecting at least one clustering trigger event related to at least one multimedia content element to be clustered; generating at least one signature for the at least one multimedia content element, each signature representing at least a portion of the at least one multimedia content element; following the generating of the at least one signature, generating, based on the generated at least one signature, at least one tag for the at least one multimedia content element; determining, based on the generated at least one signature, at least one multimedia content element cluster, wherein each multimedia content element cluster includes a plurality of clustered multimedia content elements sharing at least one common concept with the at least one multimedia content element; and adding, to each determined cluster, the at least one multimedia content element.
 2. The method of claim 1, wherein the at least one multimedia content element cluster is determined based further on the generated at least one tag.
 3. The method of claim 2, wherein each multimedia content element cluster includes a plurality of multimedia content elements associated with the generated at least one tag.
 4. The method of claim 1, further comprising: querying, with respect to the generated at least one signature, a deep content classification system to obtain at least one concept structure matching the at least one multimedia content element, each concept structure including a signature reduced cluster and metadata, and generating the at least one tag for the at least one multimedia content element based on the metadata of the at least one concept structure.
 5. The method of claim 1, wherein each multimedia content element cluster is associated with at least one portion of a signature that is common to the plurality of clustered multimedia content elements of the multimedia content element cluster and to the at least one multimedia content element.
 6. The method of claim 1, further comprising: determining, based on the generated at least one signature, whether an existing multimedia content element cluster sharing a common concept with the multimedia content element can be found; and generating another multimedia content element cluster, when it is determined that an existing multimedia content element cluster sharing a common concept with the multimedia content element cannot be found.
 7. The method of claim 1, wherein the at least one signature is generated via a signature generator system, wherein the signature generator system includes a plurality of at least partially statistically independent computational cores, wherein the properties of each computational core are set independently of properties of each other computational core.
 8. The method of claim 1, wherein the detected at least one clustering trigger event includes receiving a request to cluster the at least one multimedia content element, wherein the request includes at least one of: the at least one multimedia content element, at least one identifier of the at least one multimedia content element, and at least one location of the at least one multimedia content element.
 9. The method of claim 1, further comprising: storing, in a data storage, the at least one cluster including the added at least one multimedia content element, wherein each cluster is stored in a separate location of the data storage.
 10. The method according to claim 1 wherein the common concept represents a sub textual information out of (a) activities or actions being performed, and (b) relationships among individuals shown in the at least one multimedia content element.
 11. The method according to claim 1 wherein the common concept represents a meta aspect indicating information about an acquisition of the at least one multimedia content element.
 12. The method according to claim 1 wherein the common concept represents a user having a user device that captured the at least one multimedia content element.
 13. The method according to claim 1 wherein the common concept differs from a textual tag.
 14. The method according to claim 1 wherein the common concept represents an item captured in the at least one multimedia content element.
 16. A non-transitory computer readable medium having stored thereon instructions for causing a processing circuitry to execute a process, the process comprising: detecting at least one clustering trigger event related to at least one multimedia content element to be clustered; generating at least one signature for the at least one multimedia content element, each signature representing at least a portion of the at least one multimedia content element; following the generating of the at least one signature, generating, based on the generated at least one signature, at least one tag for the at least one multimedia content element; determining, based on the generated at least one signature, at least one multimedia content element cluster, wherein each multimedia content element cluster includes a plurality of clustered multimedia content elements sharing at least one common concept with the at least one multimedia content element; and adding, to each determined cluster, the at least one multimedia content element.
 16. A system for clustering multimedia content, comprising: a processing circuitry; and a memory, the memory containing instructions that, when executed by the processing circuitry, configure the system to: detect at least one clustering trigger event related to at least one multimedia content element to be clustered; generate at least one signature for the at least one multimedia content element, each signature representing at least a portion of the at least one multimedia content element; generate, following a generating of the at least one signature, and based on the generated at least one signature, at least one tag for the at least one multimedia content element; determine, based on the generated at least one signature, at least one multimedia content element cluster, wherein each multimedia content element cluster includes a plurality of clustered multimedia content elements sharing at least one common concept with the at least one multimedia content element; and add, to each determined cluster, the at least one multimedia content element.
 17. The system of claim 16, wherein the at least one multimedia content element cluster is determined based further on the generated at least one tag.
 18. The system of claim 17, wherein each multimedia content element cluster includes a plurality of multimedia content elements associated with the generated at least one tag.
 19. The system of claim 16, wherein the system is further configured to: query, with respect to the generated at least one signature, a deep content classification system to obtain at least one concept structure matching the at least one multimedia content element, each concept structure including a signature reduced cluster and metadata, and generate the at least one tag for the at least one multimedia content element based on the metadata of the at least one concept structure.
 20. The system of claim 16, wherein each determined cluster is associated with at least one portion of a signature that is common to the clustered multimedia content elements of the cluster and to the at least one multimedia content element. 